13 research outputs found
Exploiting 2D Floorplan for Building-scale Panorama RGBD Alignment
This paper presents a novel algorithm that utilizes a 2D floorplan to align
panorama RGBD scans. While effective panorama RGBD alignment techniques exist,
such a system requires extremely dense RGBD image sampling. Our approach can
significantly reduce the number of necessary scans with the aid of a floorplan
image. We formulate a novel Markov Random Field inference problem as a scan
placement over the floorplan, as opposed to the conventional scan-to-scan
alignment. The technical contributions lie in multi-modal image correspondence
cues (between scans and schematic floorplan) as well as a novel coverage
potential avoiding an inherent stacking bias. The proposed approach has been
evaluated on five challenging large indoor spaces. To the best of our
knowledge, we present the first effective system that utilizes a 2D floorplan
image for building-scale 3D pointcloud alignment. The source code and the data
will be shared with the community to further enhance indoor mapping research
Emergence of Intelligent Navigation Behavior in Embodied Agents from Massive-Scale Simulation
The goal of Artificial Intelligence is to build ‘thinking machines’ that ‘use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.’ In this dissertation, we will argue that the intelligence required for this goal emerges from massive-scale simulation. We will show a specific case: that intel- ligent navigation behavior emerges from massive-scale simulation and deep reinforcement learning.
Towards this end, we introduce Decentralized Distributed PPO (DD-PPO), a method that scales reinforcement learning to multiple GPUs and machines. We use DD-PPO to train agents for PointGoal navigation (e.g. ‘Go 5 meters north and 10 meters east relative to start’) for the equivalent of 80 years of human experience. This massive-scale training results in near-perfect autonomous navigation in an unseen environment without access to a map. We then examine the inner workings of special case of PointGoalNav agents. We find that (1) their memory enables shortcuts, i.e. efficiently travel through previously unexplored parts of the environment; (2) there is emergence of maps in their memory, i.e. a detailed occupancy grid of the environment can be decoded from it.
We then introduce Variable Experience Rollout (VER), a method that efficiently scales
reinforcement learning on a single GPU or machine. We use VER to train chained skills for mobile manipulation. We find a surprising emergence of navigation in skills that do not ostensibly require any navigation. Specifically, the pick skill involves a robot picking an object from a table. During training, the robot was always spawned close to the table and never needs to navigate. However, we find that if navigation actions are part of the action space, the robot learns to navigate then pick an object in new environments with 50% success, demonstrating surprisingly high out-of-distribution generalization.Ph.D
PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav
We study ObjectGoal Navigation -- where a virtual robot situated in a new
environment is asked to navigate to an object. Prior work has shown that
imitation learning (IL) using behavior cloning (BC) on a dataset of human
demonstrations achieves promising results. However, this has limitations -- 1)
BC policies generalize poorly to new states, since the training mimics actions
not their consequences, and 2) collecting demonstrations is expensive. On the
other hand, reinforcement learning (RL) is trivially scalable, but requires
careful reward engineering to achieve desirable behavior. We present PIRLNav, a
two-stage learning scheme for BC pretraining on human demonstrations followed
by RL-finetuning. This leads to a policy that achieves a success rate of
on ObjectNav ( absolute over previous state-of-the-art). Using
this BCRL training recipe, we present a rigorous empirical
analysis of design choices. First, we investigate whether human demonstrations
can be replaced with `free' (automatically generated) sources of
demonstrations, e.g. shortest paths (SP) or task-agnostic frontier exploration
(FE) trajectories. We find that BCRL on human demonstrations
outperforms BCRL on SP and FE trajectories, even when controlled
for same BC-pretraining success on train, and even on a subset of val episodes
where BC-pretraining success favors the SP or FE policies. Next, we study how
RL-finetuning performance scales with the size of the BC pretraining dataset.
We find that as we increase the size of BC-pretraining dataset and get to high
BC accuracies, improvements from RL-finetuning are smaller, and that of
the performance of our best BCRL policy can be achieved with less
than half the number of BC demonstrations. Finally, we analyze failure modes of
our ObjectNav policies, and present guidelines for further improving them.Comment: 8 pages + supplemen